Boone County
Building Knowledge Graphs Towards a Global Food Systems Datahub
Gelal, Nirmal, Gautam, Aastha, Norouzi, Sanaz Saki, Giordano, Nico, Silva, Claudio Dias da Jr, Francois, Jean Ribert, Onofre, Kelsey Andersen, Nelson, Katherine, Hutchinson, Stacy, Lin, Xiaomao, Welch, Stephen, Lollato, Romulo, Hitzler, Pascal, McGinty, Hande Küçük
Sustainable agricultural production aligns with several sustainability goals established by the United Nations (UN). However, there is a lack of studies that comprehensively examine sustainable agricultural practices across various products and production methods. Such research could provide valuable insights into the diverse factors influencing the sustainability of specific crops and produce while also identifying practices and conditions that are universally applicable to all forms of agricultural production. While this research might help us better understand sustainability, the community would still need a consistent set of vocabularies. These consistent vocabularies, which represent the underlying datasets, can then be stored in a global food systems datahub. The standardized vocabularies might help encode important information for further statistical analyses and AI/ML approaches in the datasets, resulting in the research targeting sustainable agricultural production. A structured method of representing information in sustainability, especially for wheat production, is currently unavailable. In an attempt to address this gap, we are building a set of ontologies and Knowledge Graphs (KGs) that encode knowledge associated with sustainable wheat production using formal logic. The data for this set of knowledge graphs are collected from public data sources, experimental results collected at our experiments at Kansas State University, and a Sustainability Workshop that we organized earlier in the year, which helped us collect input from different stakeholders throughout the value chain of wheat. The modeling of the ontology (i.e., the schema) for the Knowledge Graph has been in progress with the help of our domain experts, following a modular structure using KNARM methodology. In this paper, we will present our preliminary results and schemas of our Knowledge Graph and ontologies.
A Comprehensive Guide to Enhancing Antibiotic Discovery Using Machine Learning Derived Bio-computation
Uppalapati, Khartik, Dandamudi, Eeshan, Ice, S. Nick, Chandra, Gaurav, Bischof, Kirsten, Lorson, Christian L., Singh, Kamal
Traditional drug discovery is a long, expensive, and complex process. Advances in Artificial Intelligence (AI) and Machine Learning (ML) are beginning to change this narrative. Here, we provide a comprehensive overview of different AI and ML tools that can be used to streamline and accelerate the drug discovery process. By using data sets to train ML algorithms, it is possible to discover drugs or drug-like compounds relatively quickly, and efficiently. Additionally, we address limitations in AI-based drug discovery and development, including the scarcity of high-quality data to train AI models and ethical considerations. The growing impact of AI on the pharmaceutical industry is also highlighted. Finally, we discuss how AI and ML can expedite the discovery of new antibiotics to combat the problem of worldwide antimicrobial resistance (AMR).
Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review
Wang, Jinge, Cheng, Zien, Yao, Qiuming, Liu, Li, Xu, Dong, Hu, Gangqing
The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments.
COVID-19 detection from pulmonary CT scans using a novel EfficientNet with attention mechanism
Farag, Ramy, Upadhyay, Parth, Gao, Yixiang, Demby, Jacket, Montoya, Katherin Garces, Tousi, Seyed Mohamad Ali, Omotara, Gbenga, DeSouza, Guilherme
Manual analysis and diagnosis of COVID-19 through the examination of Computed Tomography (CT) images of the lungs can be time-consuming and result in errors, especially given high volume of patients and numerous images per patient. So, we address the need for automation of this task by developing a new deep learning model-based pipeline. Our motivation was sparked by the CVPR Workshop on "Domain Adaptation, Explainability and Fairness in AI for Medical Image Analysis", more specifically, the "COVID-19 Diagnosis Competition (DEF-AI-MIA COV19D)" under the same Workshop. This challenge provides an opportunity to assess our proposed pipeline for COVID-19 detection from CT scan images. The same pipeline incorporates the original EfficientNet, but with an added Attention Mechanism: EfficientNet-AM. Also, unlike the traditional/past pipelines, which relied on a pre-processing step, our pipeline takes the raw selected input images without any such step, except for an image-selection step to simply reduce the number of CT images required for training and/or testing. Moreover, our pipeline is computationally efficient, as, for example, it does not incorporate a decoder for segmenting the lungs. It also does not combine different backbones nor combine RNN with a backbone, as other pipelines in the past did. Nevertheless, our pipeline still outperforms all approaches presented by other teams in last year's instance of the same challenge, at least based on the validation subset of the competition dataset.
Reconstructing the Geometry of Random Geometric Graphs
Huang, Han, Jiradilok, Pakawut, Mossel, Elchanan
Random geometric graphs are random graph models defined on metric spaces. Such a model is defined by first sampling points from a metric space and then connecting each pair of sampled points with probability that depends on their distance, independently among pairs. In this work, we show how to efficiently reconstruct the geometry of the underlying space from the sampled graph under the manifold assumption, i.e., assuming that the underlying space is a low dimensional manifold and that the connection probability is a strictly decreasing function of the Euclidean distance between the points in a given embedding of the manifold in $\mathbb{R}^N$. Our work complements a large body of work on manifold learning, where the goal is to recover a manifold from sampled points sampled in the manifold along with their (approximate) distances.
Modeling Freight Mode Choice Using Machine Learning Classifiers: A Comparative Study Using the Commodity Flow Survey (CFS) Data
Uddin, Majbah, Anowar, Sabreena, Eluru, Naveen
This study explores the usefulness of machine learning classifiers for modeling freight mode choice. We investigate eight commonly used machine learning classifiers, namely Naive Bayes, Support Vector Machine, Artificial Neural Network, K-Nearest Neighbors, Classification and Regression Tree, Random Forest, Boosting and Bagging, along with the classical Multinomial Logit model. US 2012 Commodity Flow Survey data are used as the primary data source; we augment it with spatial attributes from secondary data sources. The performance of the classifiers is compared based on prediction accuracy results. The current research also examines the role of sample size and training-testing data split ratios on the predictive ability of the various approaches. In addition, the importance of variables is estimated to determine how the variables influence freight mode choice. The results show that the tree-based ensemble classifiers perform the best. Specifically, Random Forest produces the most accurate predictions, closely followed by Boosting and Bagging. With regard to variable importance, shipment characteristics, such as shipment distance, industry classification of the shipper and shipment size, are the most significant factors for freight mode choice decisions.
Application of 2D Homography for High Resolution Traffic Data Collection using CCTV Cameras
Zhang, Linlin, Yu, Xiang, Daud, Abdulateef, Mussah, Abdul Rashid, Adu-Gyamfi, Yaw
Traffic cameras remain the primary source data for surveillance activities such as congestion and incident monitoring. To date, State agencies continue to rely on manual effort to extract data from networked cameras due to limitations of the current automatic vision systems including requirements for complex camera calibration and inability to generate high resolution data. This study implements a three-stage video analytics framework for extracting high-resolution traffic data such vehicle counts, speed, and acceleration from infrastructure-mounted CCTV cameras. The key components of the framework include object recognition, perspective transformation, and vehicle trajectory reconstruction for traffic data collection. First, a state-of-the-art vehicle recognition model is implemented to detect and classify vehicles. Next, to correct for camera distortion and reduce partial occlusion, an algorithm inspired by two-point linear perspective is utilized to extracts the region of interest (ROI) automatically, while a 2D homography technique transforms the CCTV view to bird's-eye view (BEV). Cameras are calibrated with a two-layer matrix system to enable the extraction of speed and acceleration by converting image coordinates to real-world measurements. Individual vehicle trajectories are constructed and compared in BEV using two time-space-feature-based object trackers, namely Motpy and BYTETrack. The results of the current study showed about +/- 4.5% error rate for directional traffic counts, less than 10% MSE for speed bias between camera estimates in comparison to estimates from probe data sources. Extracting high-resolution data from traffic cameras has several implications, ranging from improvements in traffic management and identify dangerous driving behavior, high-risk areas for accidents, and other safety concerns, enabling proactive measures to reduce accidents and fatalities.
3D Object Detection and High-Resolution Traffic Parameters Extraction Using Low-Resolution LiDAR Data
Zhang, Linlin, Yu, Xiang, Aboah, Armstrong, Adu-Gyamfi, Yaw
Traffic volume data collection is a crucial aspect of transportation engineering and urban planning, as it provides vital insights into traffic patterns, congestion, and infrastructure efficiency. Traditional manual methods of traffic data collection are both time-consuming and costly. However, the emergence of modern technologies, particularly Light Detection and Ranging (LiDAR), has revolutionized the process by enabling efficient and accurate data collection. Despite the benefits of using LiDAR for traffic data collection, previous studies have identified two major limitations that have impeded its widespread adoption. These are the need for multiple LiDAR systems to obtain complete point cloud information of objects of interest, as well as the labor-intensive process of annotating 3D bounding boxes for object detection tasks. In response to these challenges, the current study proposes an innovative framework that alleviates the need for multiple LiDAR systems and simplifies the laborious 3D annotation process. To achieve this goal, the study employed a single LiDAR system, that aims at reducing the data acquisition cost and addressed its accompanying limitation of missing point cloud information by developing a Point Cloud Completion (PCC) framework to fill in missing point cloud information using point density. Furthermore, we also used zero-shot learning techniques to detect vehicles and pedestrians, as well as proposed a unique framework for extracting low to high features from the object of interest, such as height, acceleration, and speed. Using the 2D bounding box detection and extracted height information, this study is able to generate 3D bounding boxes automatically without human intervention.
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum Disorder
Rasul, Rownak Ara, Saha, Promy, Bala, Diponkor, Karim, S M Rakib Ul, Abdullah, Md. Ibrahim, Saha, Bishwajit
Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. While its primary origin lies in genetics, early detection is crucial, and leveraging machine learning offers a promising avenue for a faster and more cost-effective diagnosis. This study employs diverse machine learning methods to identify crucial ASD traits, aiming to enhance and automate the diagnostic process. We study eight state-of-the-art classification models to determine their effectiveness in ASD detection. We evaluate the models using accuracy, precision, recall, specificity, F1-score, area under the curve (AUC), kappa, and log loss metrics to find the best classifier for these binary datasets. Among all the classification models, for the children dataset, the SVM and LR models achieve the highest accuracy of 100% and for the adult dataset, the LR model produces the highest accuracy of 97.14%. Our proposed ANN model provides the highest accuracy of 94.24% for the new combined dataset when hyperparameters are precisely tuned for each model. As almost all classification models achieve high accuracy which utilize true labels, we become interested in delving into five popular clustering algorithms to understand model behavior in scenarios without true labels. We calculate Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and Silhouette Coefficient (SC) metrics to select the best clustering models. Our evaluation finds that spectral clustering outperforms all other benchmarking clustering models in terms of NMI and ARI metrics while demonstrating comparability to the optimal SC achieved by k-means. The implemented code is available at GitHub.
Using Lie derivatives with dual quaternions for parallel robots
Montgomery-Smith, Stephen, Shy, Cecil
We introduce the notion of the Lie derivative in the context of dual quaternions that represent rigid motions and twists. First we define the wrench in terms of dual quaternions. Then we show how the Lie derivative helps understand how actuators affect an end effector in parallel robots, and make it explicit in the two cases case of Stewart Platforms, and cable-driven parallel robots. We also show how to use Lie derivatives with the Newton-Raphson Method to solve the forward kinematic problem for over constrained parallel actuators. Finally, we derive the equations of motion of the end effector in dual quaternion form, which include the effect of inertia from the actuators.